CNNs for Image Classification

21 CNNs For Image Classification RENDER V2

Padding

Padding is just adding a border of pixels around an image. In PyTorch, you specify the size of this border.

Why do we need padding?

When we create a convolutional layer, we move a square filter around an image, using a center-pixel as an anchor. So, this kernel cannot perfectly overlay the edges/corners of images. The nice feature of padding is that it will allow us to control the spatial size of the output volumes (most commonly as we’ll see soon we will use it to exactly preserve the spatial size of the input volume so the input and output width and height are the same).

The most common methods of padding are padding an image with all 0-pixels (zero padding) or padding them with the nearest pixel value. You can read more about calculating the amount of padding, given a kernel_size, here .

How might you define a Maxpooling layer , such that it down-samples an input by a factor of 4 ? (A checkbox indicates that you should select ALL answers that apply.)

nn.MaxPool2d(2, 4)

nn.MaxPool2d(2, 2)

nn.MaxPool2d(4, 4)

nn.MaxPool2d(4, 2)

SOLUTION:

`nn.MaxPool2d(2, 4)`
`nn.MaxPool2d(4, 4)`

If you want to define a convolutional layer that is the same x-y size as an input array, what padding should you have for a kernel_size of 7? (You may assume that other parameters are left as their default values.)

padding=0

padding=1

padding=2

padding=3

padding=7

SOLUTION:

`padding=3`

PyTorch Layer Documentation

Convolutional Layers

We typically define a convolutional layer in PyTorch using nn.Conv2d , with the following parameters, specified:

nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0)

in_channels refers to the depth of an input. For a grayscale image, this depth = 1
out_channels refers to the desired depth of the output, or the number of filtered images you want to get as output
kernel_size is the size of your convolutional kernel (most commonly 3 for a 3x3 kernel)
stride and padding have default values, but should be set depending on how large you want your output to be in the spatial dimensions x, y

Pooling Layers

Maxpooling layers commonly come after convolutional layers to shrink the x-y dimensions of an input, read
more about pooling layers in PyTorch, here .